Complexity Constraints in Two - Armed Bandit Problems : An Example
نویسندگان
چکیده
This paper derives the optimal strategy for a two armed bandit problem under the constraint that the strategy must be implemented by a finite automaton with an exogenously given, small number of states. The idea is to find learning rules for bandit problems that are optimal subject to the constraint that they must be simple. Our main results show that the optimal rule involves an arbitrary initial bias, and random experimentation. We also show that the probability of experimentation need not be monotonically increasing in the discount factor, and that very patient decision makers suffer almost no loss from the complexity constraint.
منابع مشابه
Cognitive Capacity and Choice under Uncertainty: Human Experiments of Two-armed Bandit Problems
The two-armed bandit problem, or more generally, the multi-armed bandit problem, has been identified as the underlying problem of many practical circumstances which involves making a series of choices among uncertain alternatives. Problems like job searching, customer switching, and even the adoption of fundamental or technical trading strategies of traders in financial markets can be formulate...
متن کاملThe Max K-Armed Bandit: A New Model of Exploration Applied to Search Heuristic Selection
The multiarmed bandit is often used as an analogy for the tradeoff between exploration and exploitation in search problems. The classic problem involves allocating trials to the arms of a multiarmed slot machine to maximize the expected sum of rewards. We pose a new variation of the multiarmed bandit—the Max K-Armed Bandit—in which trials must be allocated among the arms to maximize the expecte...
متن کاملLearning and animal behavior: exploring the dynamics of simple models
Introduction All living organisms must interact with an external environment and should respond to it in a way that maximizes their probability of reproduction and survival. If an organism can learn, it will be able modify its behavior based on environmental feedback and potentially increase its survival probability. The processes underlying learning and behavior are of interest to researchers ...
متن کاملEnhancing Evolutionary Optimization in Uncertain Environments by Allocating Evaluations via Multi-armed Bandit Algorithms
Optimization problems with uncertain fitness functions are common in the real world, and present unique challenges for evolutionary optimization approaches. Existing issues include excessively expensive evaluation, lack of solution reliability, and incapability in maintaining high overall fitness during optimization. Using conversion rate optimization as an example, this paper proposes a series...
متن کاملAsymptotic Allocation Rules for a Class of Dynamic Multi-armed Bandit Problems
This paper presents a class of Dynamic Multi-Armed Bandit problems where the reward can be modeled as the noisy output of a time varying linear stochastic dynamic system that satisfies some boundedness constraints. The class allows many seemingly different problems with time varying option characteristics to be considered in a single framework. It also opens up the possibility of considering ma...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005